Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: simon c Date: Jan 27, 2003 1:05am
Forum: macromedia Subject: proposal for CD-ROM archive - comments?

Hey there all,

Following my previous messages on this board (i posted as h0l211), i've been up to the Internet Archive in person to talk about helping out with the CD-ROM archive.

I then wrote up this informal proposal, which Brewster and others suggested that I post here for comments.

If anyone has any feedback about what you'd like to see, and _especially_ technical issues (backup formats and so on), please reply, or email me personally at h0l @ mono211.com if you'd prefer.

thanks again,
simon.

--

INTRODUCTION

This proposal deals with the best way to archive the CD-ROMs in the Internet Archive's Macromedia collection. The collection comprises many thousands of CD-ROMs of PC, Mac, and PC/Mac format, mainly made between the years of 1994 and 2000.

Although a number of people online are (unofficially) archiving console software and game ROMs, nobody is making sure there are perfect digital copies and databases of the PC/Mac CD 'multimedia' boom and bust of the early and mid 90s. This is a _vital_ pre-broadband era where some of the first widely available ideas of 'virtual reality' and cinema-quality 3D graphics for the home were being explored (see 'Myst'!).

Although the Internet has now superceded a lot of the multimedia ideals the Macromedia collection stands for, that's precisely WHY the collection is important - as a document of what the era stands for. As an added impulse, the collection is stored on decayable CD media, and it's not strictly clear how long it will be until these discs will lose their reflective surfaces and become unplayable (some people claim 10 to 25 years!)

Making copies of the discs and their artwork now and storing them in a searchable database will help current and future historians of the era, and making the most interesting and relevant material available for download (with the full permission of the copyright holders!) will make people who love abandonware and free software VERY happy.


1. CD-ROM ARCHIVE FORMAT
The first important decision is how best to archive the discs as an exact copy, and then how best to distribute them to the public and use them in other ways.

The official FAQ for the newsgroup alt.binaries.cd.image recommends using an .ISO format for a CD that has one data-only track, and a .bin/.cue format for a Mode 2/Mixed Mode CD - ie, one that has a data track and multiple audio tracks. Another possibility is that the program is simple enough that the files could be extracted directly to hard disc from, say, a .ZIP file, and they would still run.

So this leaves us with 3 possibilities:

.BIN/.CUE - a 'perfect' digital copy of the disc. Needs to be burnt to disc before it will work, however.
.ISO or .ISO/.WAV - a copy of the disc that should be perfect if there are not any exotic copy protection or multiple audio files also on the disc. you can handle audio files as WAVs alongside ISOs, but re-burning them might be confusing.
.ZIP - a zipped-up version of the files contained on the disc.

These formats all have their advantages and disadvantages. I personally think we should discount .ZIP as a format because:

1. It's fairly easy to run ISOs as virtual CD-ROM drives on the PC - there's a simple setup for it. This will mean that we're really providing the CD-ROM 'as is' if we provide an ISO - it's a fairly pure version of the original disc which may also pass security checks to see if the CD-ROM is present.

2. It's also possible to extract files from ISOs easily with the Isobuster utility on PC. So if people don't like having virtual CD-ROM drives, they can just extract the files that way.

3. I wouldn't think .ZIP deals with dual PC/Mac format discs well at all, whereas .BIN/.CUE _should_, and .ISO _might_ - hah!

So .ISO is a good format, but I'm not sure it deals so well with multiple audio tracks. So my temptation right now would be:

- .BIN/.CUE for the 'master' copy of everything.
- .ISO for any CD that only has one data track.
- _maybe_ .ISO and .WAV for CDs with extra audio tracks - we need to research how easy it is to emulate and re-burn these.

We are, unfortunately, creating twice the data this way, though.

There's some Mac issues that need working through, but Macs can burn .ISO without any trouble, and Toast for Mac can burn .BIN/.CUE. Need to make sure backing up an .ISO from a PC won't negate the Mac-compatible bits of the disc, mind you - some testing needed.

[Multiple audio tracks are definitely an issue with a minority of the Macromedia collection, by the way, because CD-quality audio was one of the main draws of multimedia at that time, so many applications played music from the CD drive whilst the program was running.]

2. CD-ROM ARTWORK FORMAT

Eventually, scanning the entire manuals for posterity is deserved, time and funds permitting. Since we have a smaller amount of both for now, scanning the front and back covers of the CD-ROM and making all of them available online (whether the file image is available for download or not) would do a LOT to enhance the visual nature and attractiveness of the collection, especially for those titles that can't be downloaded.

So the suggestion for artwork for now is the front and back covers of the CD packaging _OR_ CD case only at the following sizes:

- master offline image - .TIFF at very high scan quality, not intended to be posted on the website.
- master online image - .JPG at size which enables you to read all text. You'll get this when you click on the thumbnails on the website.
- thumbnail online image - .JPG at small size, as with current thumbnails showing on site.


3. 'MACROMEDIA COLLECTION' CD-ROM ARCHIVE CONTENTS

It's important to recognise that ALL of the CD-ROMs in the collection are important. But equally, with such a large amount of CDs to sort through, I think the collection should be prioritised into three different areas.

1. PRIORITY - these CD-ROMs should be dealt with first, because they offer information that's not available elsewhere (a museum CD-ROM about totem poles, for example), they're good examples of multimedia from the time (an educational adventure about dinosaurs), or they're good pieces of cultural ephemera (the Betty Ford Clinic promotional CD-ROM or the 'magazine on a disc' ventures.)

2. NON-PRIORITY - these CD-ROMs are still important and should be dealt with when time and funds permit, but they either contain information that is NOT media rich (simple training programs which would be shown on webpages nowadays) or don't have the CD- ROM as its main focus (a music album with a small amount of added multimedia
content).

3. JAPANESE-LANGUAGE - I suspect these discs should be separated out, because we need to look at compatibility issues with backing up (can you backup Japanese-language discs if you don't have J-Win installed?) and playing issues (do you need J-Win to run these discs?) If we can work out compatibility problems, we can then prioritise them into one of the two categories above.

4. 'INTERNET ARCHIVE COLLECTION' CD-ROM ARCHIVE CONTENTS

There is probably a new collection, which will at first be VERY small, which could be called the 'Internet Archive Collection', since that's who will be assembling it. The point of this is - when we come out and (re)launch the site, there needs to be at least SOME multimedia CD-ROM stuff on there to download that people will get excited about. Some of this may be cherry-picked from elsewhere than the Macromedia archive. Right now I'm particularly thinking of:

1. Voyager Company titles - this was the CD-ROM part of the well-known Criterion Collection laserdiscs and DVDs. We should definitely find out about whether this would be possible.
2. Cyan titles - the earlier pre-Myst titles from Cyan like ‘Cosmic Osmo' and 'Manhole' are resoundingly out of print. Got to be worth a try.
3. 'Total Distortion' from Joe Sparks and Pop Rocket – a classic proto-multimedia release from the guy who has now gone on to create Devildoll and Radiskull for Shockwave :)
4. 'Starship Titanic' interests me a lot, but I have no idea whether that would be a possibility. It was Douglas Adams' last CD-ROM project and is now out of print.

The rights issues for some of these are definitely problematic, though - please be aware that this is a wishlist and there's no guarantee ANY of the above will ever appear on the site :)


5. MAKING CD-ROMS REMOTELY ACCESSIBLE

I know this was one of the original goals of the project, and I've been looking a little at the technical issues. The problem definitely seems to be that most of this multimedia CD-ROMs play audio and video files, and I just don't see a possibility of them streaming properly over a normal broadband network with a VNC-like 'PCAnywhere' piece of software running. Looking at messageboards, people are having significant trouble just over their LAN. Simple Director-authored things with easy animations and links might work ok, but that's not necessarily where the meat of the interest in the collection lies, imho.

But it's CERTAINLY worth doing LAN tests to see if things will behave properly, with a view to making machines remotely accessible over either broadband (slowly) or Internet2 (quicker!) if other issues with security and suchlike can be resolved. If this could work, it would rock :)



This post was modified by simon c on 2003-01-27 09:05:43

Reply to this post
Reply [edit]

Poster: Wendell Date: Jan 27, 2003 6:21am
Forum: macromedia Subject: Re: proposal for CD-ROM archive - comments?

I disagree with your point about making high-resolution TIFF images unavailable. You're talking about making hundreds of gigs worth of data available for downloading, and, for each, 100MB of "perfect" graphic data which could be further compressed with an archiver is somehow an issue? :-) Seems a bit silly... And I seriously doubt you'd have some overwhelming onslaught of people downloading those files anyway.

Regarding additional titles, this is an interesting concept indeed! I imagine that archive.org has some sort of legal representative who might be able to help "free" these? Personally, I would love to see some of the old way-out-of-print Sierra "talkie" CDs made available. Quality early multimedia! How fondly I recall playing King's Quest V for the first time, and being completely awe-struck when that bloody tree started singing to me. ;-)

And who can forget the ants.

*single tear*

-W

http://www.hoshinori.org/cube/tape/snowing_on_desert_lull.mp3

http://www.hoshinori.org/cube/tape/biciclette_rmx.mp3

Reply to this post
Reply [edit]

Poster: Wendell Date: Jan 27, 2003 6:33am
Forum: macromedia Subject: Re: proposal for CD-ROM archive - comments?

I just had a thought about "load distribution"...

Why not make the files additionally available through an intelligent peer-to-peer network such as eDonkey or Overnet? It would simply be a matter of providing an ed2k link with the appropriate checksum in place, and the user would be able to download an ACCURATE version of the image. You could even state outright that archive.com will not provide support if concerned about such hassles. My point is that those who already use such services would know what to do, and I believe it would free up a great deal of your overall resources.

Hey, how about that... A GOOD legal use for these networks!

-W

Reply to this post
Reply [edit]

Poster: simon c Date: Feb 3, 2003 2:22am
Forum: macromedia Subject: Re: proposal for CD-ROM archive - comments?

hey,

just wanted to thank everyone for the VERY helpful replies to this post. it's given me a lot more information from the community to work with as i continue to test out possible scenarios for the CD-ROM archive.

thanks again,
simon.

Reply to this post
Reply [edit]

Poster: dbarnes Date: Jan 28, 2003 5:12am
Forum: macromedia Subject: Re: proposal for CD-ROM archive - comments?

Please evaluate the feasability of using PNG, a lossless graphical format, instead of JPG. I want to scream everytime I try to read documents that were scanned and converted to JPG format..it's hard on the eyes!

Reply to this post
Reply [edit]

Poster: Gyvrix Date: Jan 28, 2003 9:40am
Forum: macromedia Subject: Re: proposal for CD-ROM archive - comments?

You have one of your facts completely wrong and since I didn't see that someone has already corrected you I feel I should because it deals with the .bin/.cue format which is used quite often and effectively for software distribution as long as the image maker knows what they are doing which I'm sure archive.org will and degradation should not be a problem on the 2nd or 3rd generation to be sure while users will be downloading and burning a 2nd generation copy.

The .bin/.cue format DOES NOT have to be burned to use. It can easily be mounted using the free software daemon tools (www.daemon-tools.net) or pay software like Alcohol 120% or a person who has a free copy of Nero (as I recieved with my last 2 cd burners) can "burn" the .bin/.cue to a nero image file .nrg on their hard drive and mount that with the free nero virtual cd drive.

Reply to this post
Reply [edit]

Poster: festering leper Date: Jan 28, 2003 10:08am
Forum: macromedia Subject: Re: proposal for CD-ROM archive - comments?

i appreciate you bringing your points to this discussion. you are correct in what you say about the bin/cue format being useful without necessarily being burned. i use one of the tools mentionned (daemon tools, great software :) ) and have used it to pull files out or convert to other images types before burning. i guess what i was trying to get across in my first post was the fact that burning a bin/cue isn't the risky part, it's the "ripping" or making the image from the original cd. if there are subtle errors present in the data stream they won't be apparent until someone either burns it or mounts it with a virtual cd program.

i've been noticing raw-mode problems more lately and it got me wondering if drive manufacturers, who try to cater to everyone they can, are making cdr/w drives with larger and less tested featuresets ("we'll put a flash update later, if we hear from people on problem 'x', this drive's gotta ship now")

one of my drives (sony crx-175e) can't make a raw-mode copy of my windows xp disk without corruption but can rip audio cd's with EAC just fine :/ i'm not aware on any protection on the original media and dumping the track in iso format works flawlessly every time from the same disk. the files in question are 2 small *.cat files buried in a deep subdirectory. they copy off the disk just fine but the contents are corrupt. another instance was a few friends and i have been doing work on bootable recovery cdroms and i've gotten a couple of raw-mode image files from one of them that had corrupt files but the problem went away if the same disk was extracted to an iso. i haven't trusted the bin/cue format much since then. i know the bin/cue format itself is not flawed or directly to blame for what i've experienced, but it's not very robust either.

a lot of people don't know what can happen and that's the only reason i brought it up, because of personal experience with it. i'd hate for the people here to image a pile of disks and notice subtle problems later :(

i've seen drivers floating around the net that allow one to write a cd in mode2 and put data on it. not me :)

Reply to this post
Reply [edit]

Poster: Wendell Date: Jan 28, 2003 12:51pm
Forum: macromedia Subject: Re: proposal for CD-ROM archive - comments?

CDRWIN, the original program used to create BIN/CUE files, actually has a very decent error-checking system in place, and will, optionally, continually read over "bad" sections in order to find the correct bits.

-W

Reply to this post
Reply [edit]

Poster: Jacques Richer Date: Jan 27, 2003 6:48am
Forum: macromedia Subject: Re: proposal for CD-ROM archive - comments?

As far as the Japanese disks are concerned - If they do not deviate too far from the standard, they should back up into .BIN/.CUE files just fine regardless of what platform you use to read them. (After all, the box doesn't even have to _understand_ the bits, just copy them). Whether you could _use_ them without having Japanese windows installed in another matter entirely.

In a nutshell, this approach looks good, and should work fine with any disk which can be copied by a normal, stand alone CD copier.

As far as the artwork, you might want to consider making a Jpeg2000 version in lossless mode for those who want the whole, high quality version. The compression ratio is pretty good and there are decoders which can be downloaded for free. This should give everyone most of what they want.

Reply to this post
Reply [edit]

Poster: festering leper Date: Jan 28, 2003 6:39am
Forum: macromedia Subject: Re: proposal for CD-ROM archive - comments?

i'd just like to point out something about the bin/cue format... while it's a fairly useful format it's not without its problems.

while bin/cue images can consist of cooked sectors, the format is most versatile/useful when used as a collection of raw-mode sector reads (otherwise an iso could easily suffice).

since i can't explain it as well as some others, but have experienced _cdrom_generational_loss_ firsthand - stemming from raw-mode reads, here's a piece from the comp.periph.cdr faq from the section "can i make copies of copies?"

[quote-]
The heart of the problem is the way that that the data is read from the source device. When a program does "raw" sector reads, it gets the entire 2352-byte block, which includes the CD-ROM error correction data (ECC) for the sector. Instead of applying the ECC to the sector data, the drive just hands back the entire block, including any errors that couldn't be corrected by the first C1/C2 layer of error correction (see section (2-17)). When the block is written to the CD-R, the uncorrected errors are written along with it. This problem can be avoided by using "cooked" reads and writes. Rather than create an exact duplicate of the 2352-byte source sector, cooked reads pull off the error-corrected 2048-byte sector. The CD recorder regenerates the appropriate error correction when the data is written. Ideally SNAPSHOT[or other software - ed.] would be able to do the error correction in software when operating in "raw" mode, but apparently there's no readily available code that does this. It could also read each block twice, once in raw mode and once in cooked, but that would double the read time.
This begs the question, why not just use cooked writes all the time? First of all, some recorders (e.g. Philips CDD2000 and HP4020i) don't support cooked writes. (Some others will do cooked but can't do raw, e.g. the Pinnacle RCD-5040.) Second, not all discs use 2048-byte MODE-1 sectors. There is no true "cooked" mode for MODE-2 data tracks; even a block length of 2336 is considered raw, so using cooked reads won't prevent generation loss. It is important to emphasize that the error correction included in the data sector is a *second* layer of protection. A clean original disc may well have no uncorrectable errors, and will yield an exact duplicate even when copying in "raw" mode. After a few generations, though, the duplicates are likely to suffer some generation loss. The original version of this quote went on to comment that Plextor and Sony CD-ROM drives were not recommended for making copies of copies. The reason they were singled out is because they are the only drives that explicitly warned about this problem in their programming manuals.
It is possible that *all* CD-ROM drives behave the same way. (In fact, it is arguably the correct behavior... you want raw data, you get raw data.)
...
The final answer to this question is, you can safely make copies of copies, so long as the disc is a MODE-1 CD-ROM and you're using "cooked" writes. Copies made with "raw" writes may suffer generation loss because of uncorrected errors. Audio tracks don't have the second layer of ECC, and will be susceptible to the same generation loss as data discs duplicated in "raw" mode. Some drives may turn off some error-correcting features, such as dropped-sample interpolation, during digital audio extraction, or may only use them when extracting at 1x. If you want to find out what your drive is capable of, try extracting the same track from a CD several times at different speeds, then do a binary comparison on the results.
[-end quote]
whether or not the underlying technical details are accurate for today's hardware and software, i have experienced cd generational loss firsthand, i can vouch for the fact that using raw-mode reads can truly screw your copies over.

based on the above and on my own personal experience (been doing copies/extractions/burns for 9+ yrs) i would not recommend the bin/cue format unless great care was taken to ensure that cooked sectors were used where possible. not all software does this for you. extraction followed by testing of copies might be the only way to determine what mode (cooked/raw) will work.

festering leper (never been 'burned' by cooked mode track reads :) )